Toward boosting distributed association rule mining by data de-clustering
نویسندگان
چکیده
Existing parallel algorithms for association rule mining have a large inter-site communication cost or require a large amount of space to maintain the local support counts of a large number of candidate sets. This study proposes a de-clustering approach for distributed architectures, which eliminates the inter-site communication cost, for most of the influential association rule mining algorithms. To de-cluster the database into similar partitions, an efficient algorithm is developed to approximate the shortest spanning path (SSP) to link transaction data together. The SSP obtained is then used to evenly de-cluster the transaction data into subgroups. The proposed approach guarantees that all subgroups are similar to each other and to the original group. Experiment results show that data size and the number of items are the only two factors that determine the performance of de-clustering. Additionally, based on the approach, most of the influential association rule mining algorithms can be implemented in a distributed architecture to obtain a drastic increase in speed without losing any frequent itemsets. Furthermore, the data distribution in each de-clustered participant is almost the same as that of a single site, which implies that the proposed approach can be regarded as a sampling method for distributed association rule mining. Finally, the experiment results prove that the original inadequate mining results can be improved to an almost perfect level. Keywords-association rule mining, de-clustering, distributed association rule mining, gray code, sampling, shortest spanning path.
منابع مشابه
Applying a decision support system for accident analysis by using data mining approach: A case study on one of the Iranian manufactures
Uncertain and stochastic states have been always taken into consideration in the fields of risk management and accident, like other fields of industrial engineering, and have made decision making difficult and complicated for managers in corrective action selection and control measure approach. In this research, huge data sets of the accidents of a manufacturing and industrial unit have been st...
متن کاملNew Approaches to Analyze Gasoline Rationing
In this paper, the relation among factors in the road transportation sector from March, 2005 to March, 2011 is analyzed. Most of the previous studies have economical point of view on gasoline consumption. Here, a new approach is proposed in which different data mining techniques are used to extract meaningful relations between the aforementioned factors. The main and dependent factor is gasolin...
متن کاملMining the Banking Customer Behavior Using Clustering and Association Rules Methods
The unprecedented growth of competition in the banking technology has raised the importance of retaining current customers and acquires new customers so that is important analyzing Customer behavior, which is base on bank databases. Analyzing bank databases for analyzing customer behavior is difficult since bank databases are multi-dimensional, comprised of monthly account records and daily t...
متن کاملAn Efficient Approach Generating Optimized Clusters for Theoretic Clustering Using Data Mining
The aim of the data mining process is to extract information from a large data set and transform it into an understandable structure for further use. Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer relationships,...
متن کاملClustered Collaborative Filtering Approach for Distributed Data Mining on Electronic Health Records
Distributed Data Mining (DDM) has become one of the promising areas of Data Mining. DDM techniques include classifier approach and agent-approach. Classifier approach plays a vital role in mining distributed data, having homogeneous and heterogeneous approaches depend on data sites. Homogeneous classifier approach involves ensemble learning, distributed association rule mining, meta-learning an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Sci.
دوره 180 شماره
صفحات -
تاریخ انتشار 2010